Estimating Semantic Similarity between Expanded Query and Tweet Content for Microblog Retrieval
نویسندگان
چکیده
This paper reports the systems we submitted to the Microblog Track shared in TREC 2014 which focuses on ad hoc retrieval (i.e., retrieving top 1, 000 relevant tweet for every given topic). To address this task, we adopted a two-stage framework, i.e., firstly, we performed query expansion (i.e., expanding relevant inforamtion using pseudorelevance feedback and Google search engine results) to retrieve more relevant tweets, then extracted several effective semantic features (e.g., Jansen-Shannon Distance, Overlap Similarity, Lucene Score, etc) from retrieved results and built ranking model using supervised machine learning algorithms with the aid of these features to perform re-ranking. Our systems ranked 3th out of 21 teams.
منابع مشابه
Siena's Twitter Information Retrieval System: The 2012 Microblog Track
Since 1992, the National Institute of Standards and Technology (NIST) has been annually hosting the Text Retrieval Conference (TREC). One of the newest tracks, which started in 2011, is the Microblog Track, which uses a well-known social network site, Twitter[10], as its source of microblog data. Twitter allows its users to post 140 character length tweets to share messages with their followers...
متن کاملHU DB at TREC 2014 Microblog Track
This paper describes our system for the Tweet Timeline Generation (TTG) task of the Microblog track, at the Text Retrieval Conference (TREC) 2014. Intuitively, given a collection of microblog posts (i.e., tweets), and a keyword query Q, the goal is to generate a timeline of related tweets. Such a timeline consists of representative tweets, relevant to Q. In our system we employ query expansion ...
متن کاملQuery Expansion and Message-Passing Algorithms for TREC Microblog Track
This report describes the methods that our Information Retrieval Group at Purdue University used for the TREC Microblog 2011 track. The first method is the pseudo-relevance feedback, a traditional algorithm to reformulate the query by adding expanded terms to the query. The second method is the affinity propagation, a non parametric clustering algorithm that can group the top tweets according t...
متن کاملSummarizing Disaster Related Event from Microblog
The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which describe important information required during any disaster related incident. The main goal of this task is to gather how well the participant’s system summariz...
متن کاملSemiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کامل